Overview

Dataset statistics

Number of variables18
Number of observations403776
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory55.5 MiB
Average record size in memory144.0 B

Variable types

Numeric15
Categorical3

Warnings

REF_NO is highly correlated with yearHigh correlation
year is highly correlated with REF_NOHigh correlation
PM2.5 is highly correlated with PM10 and 2 other fieldsHigh correlation
PM10 is highly correlated with PM2.5 and 2 other fieldsHigh correlation
SO2 is highly correlated with COHigh correlation
NO2 is highly correlated with PM2.5 and 2 other fieldsHigh correlation
CO is highly correlated with PM2.5 and 3 other fieldsHigh correlation
O3 is highly correlated with TEMPHigh correlation
TEMP is highly correlated with O3 and 2 other fieldsHigh correlation
PRES is highly correlated with TEMP and 1 other fieldsHigh correlation
DEWP is highly correlated with TEMP and 1 other fieldsHigh correlation
REF_NO is highly correlated with yearHigh correlation
year is highly correlated with REF_NOHigh correlation
PM2.5 is highly correlated with PM10 and 2 other fieldsHigh correlation
PM10 is highly correlated with PM2.5 and 2 other fieldsHigh correlation
SO2 is highly correlated with NO2 and 1 other fieldsHigh correlation
NO2 is highly correlated with PM2.5 and 4 other fieldsHigh correlation
CO is highly correlated with PM2.5 and 3 other fieldsHigh correlation
O3 is highly correlated with NO2 and 1 other fieldsHigh correlation
TEMP is highly correlated with O3 and 2 other fieldsHigh correlation
PRES is highly correlated with TEMP and 1 other fieldsHigh correlation
DEWP is highly correlated with TEMP and 1 other fieldsHigh correlation
REF_NO is highly correlated with yearHigh correlation
year is highly correlated with REF_NOHigh correlation
PM2.5 is highly correlated with PM10 and 1 other fieldsHigh correlation
PM10 is highly correlated with PM2.5 and 1 other fieldsHigh correlation
NO2 is highly correlated with COHigh correlation
CO is highly correlated with PM2.5 and 2 other fieldsHigh correlation
TEMP is highly correlated with PRES and 1 other fieldsHigh correlation
PRES is highly correlated with TEMP and 1 other fieldsHigh correlation
DEWP is highly correlated with TEMP and 1 other fieldsHigh correlation
year is highly correlated with REF_NOHigh correlation
PM10 is highly correlated with CO and 2 other fieldsHigh correlation
month is highly correlated with PRES and 3 other fieldsHigh correlation
PRES is highly correlated with month and 3 other fieldsHigh correlation
REF_NO is highly correlated with year and 4 other fieldsHigh correlation
TEMP is highly correlated with month and 3 other fieldsHigh correlation
CO is highly correlated with PM10 and 2 other fieldsHigh correlation
PM2.5 is highly correlated with PM10 and 2 other fieldsHigh correlation
DEWP is highly correlated with month and 3 other fieldsHigh correlation
NO2 is highly correlated with PM10 and 2 other fieldsHigh correlation
RAIN is highly skewed (γ1 = 29.4497644) Skewed
REF_NO is uniformly distributed Uniform
station is uniformly distributed Uniform
hour has 16824 (4.2%) zeros Zeros
RAIN has 387119 (95.9%) zeros Zeros
WSPM has 10891 (2.7%) zeros Zeros

Reproduction

Analysis started2021-08-19 12:22:42.888990
Analysis finished2021-08-19 12:30:41.848384
Duration7 minutes and 58.96 seconds
Software versionpandas-profiling v3.0.0
Download configurationconfig.json

Variables

REF_NO
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
UNIFORM

Distinct33648
Distinct (%)8.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean16824.5
Minimum1
Maximum33648
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum1
5-th percentile1683
Q18412.75
median16824.5
Q325236.25
95-th percentile31966
Maximum33648
Range33647
Interquartile range (IQR)16823.5

Descriptive statistics

Standard deviation9713.352953
Coefficient of variation (CV)0.5773338258
Kurtosis-1.200000002
Mean16824.5
Median Absolute Deviation (MAD)8412
Skewness0
Sum6793329312
Variance94349225.58
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
204712
 
< 0.1%
1211312
 
< 0.1%
1621112
 
< 0.1%
187612
 
< 0.1%
392512
 
< 0.1%
597412
 
< 0.1%
802312
 
< 0.1%
2645612
 
< 0.1%
2850512
 
< 0.1%
3055412
 
< 0.1%
Other values (33638)403656
> 99.9%
ValueCountFrequency (%)
112
< 0.1%
212
< 0.1%
312
< 0.1%
412
< 0.1%
512
< 0.1%
612
< 0.1%
712
< 0.1%
812
< 0.1%
912
< 0.1%
1012
< 0.1%
ValueCountFrequency (%)
3364812
< 0.1%
3364712
< 0.1%
3364612
< 0.1%
3364512
< 0.1%
3364412
< 0.1%
3364312
< 0.1%
3364212
< 0.1%
3364112
< 0.1%
3364012
< 0.1%
3363912
< 0.1%

year
Categorical

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
2016
105408 
2015
105120 
2014
105120 
2013
88128 

Length

Max length4
Median length4
Mean length4
Min length4

Characters and Unicode

Total characters1615104
Distinct characters7
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2013
2nd row2013
3rd row2013
4th row2013
5th row2013

Common Values

ValueCountFrequency (%)
2016105408
26.1%
2015105120
26.0%
2014105120
26.0%
201388128
21.8%

Length

Histogram of lengths of the category

Pie chart

ValueCountFrequency (%)
2016105408
26.1%
2014105120
26.0%
2015105120
26.0%
201388128
21.8%

Most occurring characters

ValueCountFrequency (%)
2403776
25.0%
0403776
25.0%
1403776
25.0%
6105408
 
6.5%
4105120
 
6.5%
5105120
 
6.5%
388128
 
5.5%

Most occurring categories

ValueCountFrequency (%)
Decimal Number1615104
100.0%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
2403776
25.0%
0403776
25.0%
1403776
25.0%
6105408
 
6.5%
4105120
 
6.5%
5105120
 
6.5%
388128
 
5.5%

Most occurring scripts

ValueCountFrequency (%)
Common1615104
100.0%

Most frequent character per script

Common
ValueCountFrequency (%)
2403776
25.0%
0403776
25.0%
1403776
25.0%
6105408
 
6.5%
4105120
 
6.5%
5105120
 
6.5%
388128
 
5.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII1615104
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2403776
25.0%
0403776
25.0%
1403776
25.0%
6105408
 
6.5%
4105120
 
6.5%
5105120
 
6.5%
388128
 
5.5%

month
Real number (ℝ≥0)

HIGH CORRELATION

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.735378031
Minimum1
Maximum12
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum1
5-th percentile1
Q14
median7
Q310
95-th percentile12
Maximum12
Range11
Interquartile range (IQR)6

Descriptive statistics

Standard deviation3.356479072
Coefficient of variation (CV)0.4983356623
Kurtosis-1.157296025
Mean6.735378031
Median Absolute Deviation (MAD)3
Skewness-0.0532691034
Sum2719584
Variance11.26595176
MonotonicityNot monotonic
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
1235712
8.8%
1035712
8.8%
835712
8.8%
735712
8.8%
535712
8.8%
335712
8.8%
1134560
8.6%
934560
8.6%
634560
8.6%
434560
8.6%
Other values (2)51264
12.7%
ValueCountFrequency (%)
126784
6.6%
224480
6.1%
335712
8.8%
434560
8.6%
535712
8.8%
634560
8.6%
735712
8.8%
835712
8.8%
934560
8.6%
1035712
8.8%
ValueCountFrequency (%)
1235712
8.8%
1134560
8.6%
1035712
8.8%
934560
8.6%
835712
8.8%
735712
8.8%
634560
8.6%
535712
8.8%
434560
8.6%
335712
8.8%

day
Real number (ℝ≥0)

Distinct31
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.74821683
Minimum1
Maximum31
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum1
5-th percentile2
Q18
median16
Q323
95-th percentile29
Maximum31
Range30
Interquartile range (IQR)15

Descriptive statistics

Standard deviation8.808891484
Coefficient of variation (CV)0.5593580262
Kurtosis-1.195325155
Mean15.74821683
Median Absolute Deviation (MAD)8
Skewness0.005682826695
Sum6358752
Variance77.59656917
MonotonicityNot monotonic
Histogram with fixed size bins (bins=31)
ValueCountFrequency (%)
1613248
 
3.3%
1513248
 
3.3%
213248
 
3.3%
313248
 
3.3%
413248
 
3.3%
513248
 
3.3%
613248
 
3.3%
713248
 
3.3%
813248
 
3.3%
913248
 
3.3%
Other values (21)271296
67.2%
ValueCountFrequency (%)
113248
3.3%
213248
3.3%
313248
3.3%
413248
3.3%
513248
3.3%
613248
3.3%
713248
3.3%
813248
3.3%
913248
3.3%
1013248
3.3%
ValueCountFrequency (%)
317776
1.9%
3012384
3.1%
2912672
3.1%
2813248
3.3%
2713248
3.3%
2613248
3.3%
2513248
3.3%
2413248
3.3%
2313248
3.3%
2213248
3.3%

hour
Real number (ℝ≥0)

ZEROS

Distinct24
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11.5
Minimum0
Maximum23
Zeros16824
Zeros (%)4.2%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum0
5-th percentile1
Q15.75
median11.5
Q317.25
95-th percentile22
Maximum23
Range23
Interquartile range (IQR)11.5

Descriptive statistics

Standard deviation6.922195124
Coefficient of variation (CV)0.6019300108
Kurtosis-1.204173965
Mean11.5
Median Absolute Deviation (MAD)6
Skewness0
Sum4643424
Variance47.91678534
MonotonicityNot monotonic
Histogram with fixed size bins (bins=24)
ValueCountFrequency (%)
2316824
 
4.2%
2216824
 
4.2%
116824
 
4.2%
216824
 
4.2%
316824
 
4.2%
416824
 
4.2%
516824
 
4.2%
616824
 
4.2%
716824
 
4.2%
816824
 
4.2%
Other values (14)235536
58.3%
ValueCountFrequency (%)
016824
4.2%
116824
4.2%
216824
4.2%
316824
4.2%
416824
4.2%
516824
4.2%
616824
4.2%
716824
4.2%
816824
4.2%
916824
4.2%
ValueCountFrequency (%)
2316824
4.2%
2216824
4.2%
2116824
4.2%
2016824
4.2%
1916824
4.2%
1816824
4.2%
1716824
4.2%
1616824
4.2%
1516824
4.2%
1416824
4.2%

PM2.5
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct866
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.32702142
Minimum2
Maximum999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum2
5-th percentile6
Q121
median57
Q3109
95-th percentile237
Maximum999
Range997
Interquartile range (IQR)88

Descriptive statistics

Standard deviation78.31352866
Coefficient of variation (CV)0.9872238648
Kurtosis5.907034808
Mean79.32702142
Median Absolute Deviation (MAD)40
Skewness1.992182434
Sum32030347.4
Variance6133.008772
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
8310228
 
2.5%
38354
 
2.1%
106609
 
1.6%
116418
 
1.6%
96374
 
1.6%
126346
 
1.6%
86333
 
1.6%
135830
 
1.4%
145765
 
1.4%
75742
 
1.4%
Other values (856)335777
83.2%
ValueCountFrequency (%)
27
 
< 0.1%
38354
2.1%
43221
 
0.8%
4.32
 
< 0.1%
4.41
 
< 0.1%
4.61
 
< 0.1%
53984
1.0%
65116
1.3%
75742
1.4%
7.21
 
< 0.1%
ValueCountFrequency (%)
9991
< 0.1%
9571
< 0.1%
9411
< 0.1%
8981
< 0.1%
8821
< 0.1%
8811
< 0.1%
8571
< 0.1%
8441
< 0.1%
8261
< 0.1%
8211
< 0.1%

PM10
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1048
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean103.9992444
Minimum2
Maximum999
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum2
5-th percentile11
Q137
median83
Q3144
95-th percentile275
Maximum999
Range997
Interquartile range (IQR)107

Descriptive statistics

Standard deviation89.47779529
Coefficient of variation (CV)0.8603696673
Kurtosis5.885451335
Mean103.9992444
Median Absolute Deviation (MAD)51
Skewness1.839084868
Sum41992398.9
Variance8006.27585
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
838144
 
2.0%
64712
 
1.2%
53547
 
0.9%
183523
 
0.9%
143493
 
0.9%
163405
 
0.8%
173383
 
0.8%
133349
 
0.8%
203336
 
0.8%
243240
 
0.8%
Other values (1038)363644
90.1%
ValueCountFrequency (%)
2103
 
< 0.1%
3719
 
0.2%
4264
 
0.1%
53547
0.9%
5.42
 
< 0.1%
5.61
 
< 0.1%
64712
1.2%
6.41
 
< 0.1%
6.61
 
< 0.1%
72245
0.6%
ValueCountFrequency (%)
9993
< 0.1%
9951
 
< 0.1%
9931
 
< 0.1%
9921
 
< 0.1%
9911
 
< 0.1%
9881
 
< 0.1%
9871
 
< 0.1%
9861
 
< 0.1%
9841
 
< 0.1%
9831
 
< 0.1%

SO2
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct685
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean15.54324847
Minimum0.2856
Maximum500
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum0.2856
5-th percentile2
Q12.2848
median7
Q319
95-th percentile60
Maximum500
Range499.7144
Interquartile range (IQR)16.7152

Descriptive statistics

Standard deviation21.53958095
Coefficient of variation (CV)1.385783737
Kurtosis14.36912969
Mean15.54324847
Median Absolute Deviation (MAD)5
Skewness3.050025452
Sum6275990.695
Variance463.9535476
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
297027
24.0%
331771
 
7.9%
722415
 
5.6%
420810
 
5.2%
517091
 
4.2%
615762
 
3.9%
812722
 
3.2%
910952
 
2.7%
1010096
 
2.5%
118863
 
2.2%
Other values (675)156267
38.7%
ValueCountFrequency (%)
0.285689
 
< 0.1%
0.571270
 
< 0.1%
0.856872
 
< 0.1%
13221
 
0.8%
1.142484
 
< 0.1%
1.42894
 
< 0.1%
1.713683
 
< 0.1%
1.9992110
 
< 0.1%
297027
24.0%
2.11
 
< 0.1%
ValueCountFrequency (%)
5003
< 0.1%
4111
 
< 0.1%
3411
 
< 0.1%
3151
 
< 0.1%
3141
 
< 0.1%
3101
 
< 0.1%
2991
 
< 0.1%
2821
 
< 0.1%
2781
 
< 0.1%
2771
 
< 0.1%

NO2
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1210
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean50.35278459
Minimum1.0265
Maximum290
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum1.0265
5-th percentile8
Q124
median44
Q370
95-th percentile116
Maximum290
Range288.9735
Interquartile range (IQR)46

Descriptive statistics

Standard deviation34.25747322
Coefficient of variation (CV)0.6803491305
Kurtosis1.338853417
Mean50.35278459
Median Absolute Deviation (MAD)22
Skewness1.068509369
Sum20331245.95
Variance1173.574471
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
50.3527845911859
 
2.9%
165572
 
1.4%
225556
 
1.4%
205523
 
1.4%
175467
 
1.4%
185441
 
1.3%
265420
 
1.3%
215416
 
1.3%
195368
 
1.3%
145366
 
1.3%
Other values (1200)342788
84.9%
ValueCountFrequency (%)
1.02653
 
< 0.1%
1.23182
 
< 0.1%
1.43712
 
< 0.1%
1.64243
 
< 0.1%
1.84771
 
< 0.1%
24364
1.1%
2.0531
 
< 0.1%
2.25833
 
< 0.1%
2.46361
 
< 0.1%
2.66892
 
< 0.1%
ValueCountFrequency (%)
2901
< 0.1%
2851
< 0.1%
2801
< 0.1%
2772
< 0.1%
2731
< 0.1%
2701
< 0.1%
2691
< 0.1%
2651
< 0.1%
2641
< 0.1%
2632
< 0.1%

CO
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct132
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1199.044874
Minimum100
Maximum10000
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum100
5-th percentile200
Q1500
median900
Q31500
95-th percentile3399
Maximum10000
Range9900
Interquartile range (IQR)1000

Descriptive statistics

Standard deviation1097.868685
Coefficient of variation (CV)0.9156193478
Kurtosis10.15730245
Mean1199.044874
Median Absolute Deviation (MAD)400
Skewness2.653987532
Sum484145543
Variance1205315.65
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90040916
 
10.1%
30030662
 
7.6%
40029849
 
7.4%
50028043
 
6.9%
60027189
 
6.7%
70025720
 
6.4%
80022728
 
5.6%
100019026
 
4.7%
20017370
 
4.3%
110017009
 
4.2%
Other values (122)145264
36.0%
ValueCountFrequency (%)
1005091
 
1.3%
1501
 
< 0.1%
20017370
4.3%
30030662
7.6%
3501
 
< 0.1%
40029849
7.4%
50028043
6.9%
60027189
6.7%
70025720
6.4%
80022728
5.6%
ValueCountFrequency (%)
1000051
< 0.1%
990025
< 0.1%
980024
< 0.1%
970023
< 0.1%
960023
< 0.1%
950022
< 0.1%
940025
< 0.1%
930031
< 0.1%
920031
< 0.1%
910031
< 0.1%

O3
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION

Distinct1597
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean57.69670856
Minimum0.2142
Maximum1071
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum0.2142
5-th percentile2
Q112
median45
Q382
95-th percentile178
Maximum1071
Range1070.7858
Interquartile range (IQR)70

Descriptive statistics

Standard deviation56.49177386
Coefficient of variation (CV)0.9791160582
Kurtosis6.394631357
Mean57.69670856
Median Absolute Deviation (MAD)34.29
Skewness1.680004333
Sum23296546.2
Variance3191.320514
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
240544
 
10.0%
4515583
 
3.9%
38245
 
2.0%
47636
 
1.9%
16878
 
1.7%
56129
 
1.5%
65641
 
1.4%
84796
 
1.2%
74642
 
1.1%
103940
 
1.0%
Other values (1587)299742
74.2%
ValueCountFrequency (%)
0.2142134
 
< 0.1%
0.4284119
 
< 0.1%
0.6426118
 
< 0.1%
0.8568120
 
< 0.1%
16878
1.7%
1.071138
 
< 0.1%
1.2852147
 
< 0.1%
1.4994166
 
< 0.1%
1.7136125
 
< 0.1%
1.9278147
 
< 0.1%
ValueCountFrequency (%)
107114
< 0.1%
10501
 
< 0.1%
10261
 
< 0.1%
6741
 
< 0.1%
6731
 
< 0.1%
5005
 
< 0.1%
4501
 
< 0.1%
4441
 
< 0.1%
4321
 
< 0.1%
4291
 
< 0.1%

TEMP
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct1188
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean14.08889947
Minimum-19.9
Maximum41.6
Zeros2642
Zeros (%)0.7%
Negative55474
Negative (%)13.7%
Memory size3.1 MiB

Quantile statistics

Minimum-19.9
5-th percentile-4
Q14
median15.4
Q323.5
95-th percentile30.7
Maximum41.6
Range61.5
Interquartile range (IQR)19.5

Descriptive statistics

Standard deviation11.29983762
Coefficient of variation (CV)0.8020383452
Kurtosis-1.086168918
Mean14.08889947
Median Absolute Deviation (MAD)9.4
Skewness-0.1687530123
Sum5688759.474
Variance127.6863303
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
33342
 
0.8%
12796
 
0.7%
02642
 
0.7%
22556
 
0.6%
-12436
 
0.6%
-22293
 
0.6%
-41844
 
0.5%
41772
 
0.4%
51680
 
0.4%
-51633
 
0.4%
Other values (1178)380782
94.3%
ValueCountFrequency (%)
-19.91
< 0.1%
-19.71
< 0.1%
-19.51
< 0.1%
-18.91
< 0.1%
-18.71
< 0.1%
-18.51
< 0.1%
-18.11
< 0.1%
-17.91
< 0.1%
-17.41
< 0.1%
-17.31
< 0.1%
ValueCountFrequency (%)
41.61
 
< 0.1%
41.42
 
< 0.1%
41.13
 
< 0.1%
412
 
< 0.1%
40.91
 
< 0.1%
40.62
 
< 0.1%
40.58
< 0.1%
40.43
 
< 0.1%
40.34
< 0.1%
40.22
 
< 0.1%

PRES
Real number (ℝ≥0)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct677
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1010.282534
Minimum982.4
Maximum1042.8
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum982.4
5-th percentile994.6
Q11002
median1009.8
Q31018.3
95-th percentile1027.4
Maximum1042.8
Range60.4
Interquartile range (IQR)16.3

Descriptive statistics

Standard deviation10.35337882
Coefficient of variation (CV)0.01024800338
Kurtosis-0.7814634981
Mean1010.282534
Median Absolute Deviation (MAD)8.2
Skewness0.151997731
Sum407927840.5
Variance107.192453
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
10192712
 
0.7%
10182695
 
0.7%
10212691
 
0.7%
10152602
 
0.6%
10232596
 
0.6%
10202570
 
0.6%
10172554
 
0.6%
10162528
 
0.6%
10222474
 
0.6%
10242455
 
0.6%
Other values (667)377899
93.6%
ValueCountFrequency (%)
982.42
 
< 0.1%
982.72
 
< 0.1%
982.83
< 0.1%
982.92
 
< 0.1%
9834
< 0.1%
983.24
< 0.1%
983.33
< 0.1%
983.42
 
< 0.1%
983.56
< 0.1%
983.64
< 0.1%
ValueCountFrequency (%)
1042.82
 
< 0.1%
1042.41
 
< 0.1%
1042.32
 
< 0.1%
1042.21
 
< 0.1%
104211
< 0.1%
1041.88
< 0.1%
1041.71
 
< 0.1%
1041.67
< 0.1%
1041.52
 
< 0.1%
1041.48
< 0.1%

DEWP
Real number (ℝ)

HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION
HIGH CORRELATION

Distinct646
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.157291447
Minimum-43.4
Maximum29.1
Zeros828
Zeros (%)0.2%
Negative168595
Negative (%)41.8%
Memory size3.1 MiB

Quantile statistics

Minimum-43.4
5-th percentile-19.4
Q1-8
median4.1
Q315.5
95-th percentile22.2
Maximum29.1
Range72.5
Interquartile range (IQR)23.5

Descriptive statistics

Standard deviation13.61273596
Coefficient of variation (CV)4.311523402
Kurtosis-1.076908329
Mean3.157291447
Median Absolute Deviation (MAD)11.7
Skewness-0.2501055805
Sum1274838.511
Variance185.3065803
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
17.61559
 
0.4%
171519
 
0.4%
17.21490
 
0.4%
16.81483
 
0.4%
17.31455
 
0.4%
17.11445
 
0.4%
17.81440
 
0.4%
16.21429
 
0.4%
18.21426
 
0.4%
17.51409
 
0.3%
Other values (636)389121
96.4%
ValueCountFrequency (%)
-43.41
 
< 0.1%
-361
 
< 0.1%
-35.71
 
< 0.1%
-35.51
 
< 0.1%
-35.37
< 0.1%
-35.19
< 0.1%
-356
< 0.1%
-34.92
 
< 0.1%
-34.87
< 0.1%
-34.62
 
< 0.1%
ValueCountFrequency (%)
29.12
 
< 0.1%
291
 
< 0.1%
28.810
< 0.1%
28.712
< 0.1%
28.62
 
< 0.1%
28.512
< 0.1%
28.414
< 0.1%
28.314
< 0.1%
28.29
< 0.1%
28.19
< 0.1%

RAIN
Real number (ℝ≥0)

SKEWED
ZEROS

Distinct254
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.06705178246
Minimum0
Maximum72.5
Zeros387119
Zeros (%)95.9%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile0
Maximum72.5
Range72.5
Interquartile range (IQR)0

Descriptive statistics

Standard deviation0.8375740317
Coefficient of variation (CV)12.49145065
Kurtosis1292.745862
Mean0.06705178246
Median Absolute Deviation (MAD)0
Skewness29.4497644
Sum27073.90052
Variance0.7015302586
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0387119
95.9%
0.13689
 
0.9%
0.21823
 
0.5%
0.31374
 
0.3%
0.4885
 
0.2%
0.5847
 
0.2%
0.6698
 
0.2%
0.7585
 
0.1%
0.9502
 
0.1%
0.8482
 
0.1%
Other values (244)5772
 
1.4%
ValueCountFrequency (%)
0387119
95.9%
0.06705178246261
 
0.1%
0.13689
 
0.9%
0.21823
 
0.5%
0.31374
 
0.3%
0.4885
 
0.2%
0.5847
 
0.2%
0.6698
 
0.2%
0.7585
 
0.1%
0.8482
 
0.1%
ValueCountFrequency (%)
72.53
< 0.1%
52.12
 
< 0.1%
47.71
 
< 0.1%
46.46
< 0.1%
45.92
 
< 0.1%
41.91
 
< 0.1%
40.73
< 0.1%
391
 
< 0.1%
38.91
 
< 0.1%
37.42
 
< 0.1%

wd
Categorical

Distinct16
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
NE
41438 
ENE
33262 
N
29973 
NW
29587 
E
29168 
Other values (11)
240348 

Length

Max length3
Median length2
Mean length2.237356851
Min length1

Characters and Unicode

Total characters903391
Distinct characters4
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNNW
2nd rowN
3rd rowNNW
4th rowNW
5th rowN

Common Values

ValueCountFrequency (%)
NE41438
 
10.3%
ENE33262
 
8.2%
N29973
 
7.4%
NW29587
 
7.3%
E29168
 
7.2%
NNE27247
 
6.7%
SW27083
 
6.7%
NNW24167
 
6.0%
WNW23815
 
5.9%
ESE23691
 
5.9%
Other values (6)114345
28.3%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
ne41438
 
10.3%
ene33262
 
8.2%
n29973
 
7.4%
nw29587
 
7.3%
e29168
 
7.2%
nne27247
 
6.7%
sw27083
 
6.7%
nnw24167
 
6.0%
wnw23815
 
5.9%
ese23691
 
5.9%
Other values (6)114345
28.3%

Most occurring characters

ValueCountFrequency (%)
N260903
28.9%
E248114
27.5%
W207045
22.9%
S187329
20.7%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter903391
100.0%

Most frequent character per category

Uppercase Letter
ValueCountFrequency (%)
N260903
28.9%
E248114
27.5%
W207045
22.9%
S187329
20.7%

Most occurring scripts

ValueCountFrequency (%)
Latin903391
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
N260903
28.9%
E248114
27.5%
W207045
22.9%
S187329
20.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII903391
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
N260903
28.9%
E248114
27.5%
W207045
22.9%
S187329
20.7%

WSPM
Real number (ℝ≥0)

ZEROS

Distinct115
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1.718192017
Minimum0
Maximum13.2
Zeros10891
Zeros (%)2.7%
Negative0
Negative (%)0.0%
Memory size3.1 MiB

Quantile statistics

Minimum0
5-th percentile0.3
Q10.9
median1.4
Q32.2
95-th percentile4.2
Maximum13.2
Range13.2
Interquartile range (IQR)1.3

Descriptive statistics

Standard deviation1.237624097
Coefficient of variation (CV)0.7203060454
Kurtosis3.695959966
Mean1.718192017
Median Absolute Deviation (MAD)0.6
Skewness1.626099043
Sum693764.7
Variance1.531713406
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1.121486
 
5.3%
121370
 
5.3%
1.221228
 
5.3%
0.920237
 
5.0%
1.319640
 
4.9%
0.818585
 
4.6%
1.418014
 
4.5%
0.716969
 
4.2%
1.516273
 
4.0%
1.615098
 
3.7%
Other values (105)214876
53.2%
ValueCountFrequency (%)
010891
2.7%
0.14175
 
1.0%
0.24378
 
1.1%
0.32673
 
0.7%
0.47154
 
1.8%
0.510842
2.7%
0.613881
3.4%
0.716969
4.2%
0.818585
4.6%
0.920237
5.0%
ValueCountFrequency (%)
13.21
 
< 0.1%
12.91
 
< 0.1%
12.81
 
< 0.1%
11.81
 
< 0.1%
11.71
 
< 0.1%
11.23
< 0.1%
111
 
< 0.1%
10.93
< 0.1%
10.71
 
< 0.1%
10.53
< 0.1%

station
Categorical

UNIFORM

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
Dingling
33648 
Wanshouxigong
33648 
Guanyuan
33648 
Huairou
33648 
Wanliu
33648 
Other values (7)
235536 

Length

Max length13
Median length7.5
Mean length8.416666667
Min length6

Characters and Unicode

Total characters3398448
Distinct characters26
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowAotizhongxin
2nd rowAotizhongxin
3rd rowAotizhongxin
4th rowAotizhongxin
5th rowAotizhongxin

Common Values

ValueCountFrequency (%)
Dingling33648
8.3%
Wanshouxigong33648
8.3%
Guanyuan33648
8.3%
Huairou33648
8.3%
Wanliu33648
8.3%
Aotizhongxin33648
8.3%
Dongsi33648
8.3%
Nongzhanguan33648
8.3%
Shunyi33648
8.3%
Tiantan33648
8.3%
Other values (2)67296
16.7%

Length

Histogram of lengths of the category
ValueCountFrequency (%)
changping33648
8.3%
dongsi33648
8.3%
shunyi33648
8.3%
gucheng33648
8.3%
nongzhanguan33648
8.3%
guanyuan33648
8.3%
huairou33648
8.3%
dingling33648
8.3%
wanliu33648
8.3%
aotizhongxin33648
8.3%
Other values (2)67296
16.7%

Most occurring characters

ValueCountFrequency (%)
n639312
18.8%
i370128
10.9%
g370128
10.9%
a336480
9.9%
u302832
8.9%
o235536
 
6.9%
h201888
 
5.9%
t67296
 
2.0%
z67296
 
2.0%
x67296
 
2.0%
Other values (16)740256
21.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter2994672
88.1%
Uppercase Letter403776
 
11.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n639312
21.3%
i370128
12.4%
g370128
12.4%
a336480
11.2%
u302832
10.1%
o235536
 
7.9%
h201888
 
6.7%
t67296
 
2.2%
z67296
 
2.2%
x67296
 
2.2%
Other values (7)336480
11.2%
Uppercase Letter
ValueCountFrequency (%)
D67296
16.7%
G67296
16.7%
W67296
16.7%
A33648
8.3%
C33648
8.3%
H33648
8.3%
N33648
8.3%
S33648
8.3%
T33648
8.3%

Most occurring scripts

ValueCountFrequency (%)
Latin3398448
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
n639312
18.8%
i370128
10.9%
g370128
10.9%
a336480
9.9%
u302832
8.9%
o235536
 
6.9%
h201888
 
5.9%
t67296
 
2.0%
z67296
 
2.0%
x67296
 
2.0%
Other values (16)740256
21.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII3398448
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n639312
18.8%
i370128
10.9%
g370128
10.9%
a336480
9.9%
u302832
8.9%
o235536
 
6.9%
h201888
 
5.9%
t67296
 
2.0%
z67296
 
2.0%
x67296
 
2.0%
Other values (16)740256
21.8%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

First rows

REF_NOyearmonthdayhourPM2.5PM10SO2NO2COO3TEMPPRESDEWPRAINwdWSPMstation
0120133104.04.04.07.0300.077.0-0.71023.0-18.80.0NNW4.4Aotizhongxin
1220133118.08.04.07.0300.077.0-1.11023.2-18.20.0N4.7Aotizhongxin
2320133127.07.05.010.0300.073.0-1.11023.5-18.20.0NNW5.6Aotizhongxin
3420133136.06.011.011.0300.072.0-1.41024.5-19.40.0NW3.1Aotizhongxin
4520133143.03.012.012.0300.072.0-2.01025.2-19.50.0N2.0Aotizhongxin
5620133155.05.018.018.0400.066.0-2.21025.6-19.60.0N3.7Aotizhongxin
6720133163.03.018.032.0500.050.0-2.61026.5-19.10.0NNE2.5Aotizhongxin
7820133173.06.019.041.0500.043.0-1.61027.4-19.10.0NNW3.8Aotizhongxin
8920133183.06.016.043.0500.045.00.11028.3-19.20.0NNW4.1Aotizhongxin
91020133193.08.012.028.0400.059.01.21028.5-19.30.0N2.6Aotizhongxin

Last rows

REF_NOyearmonthdayhourPM2.5PM10SO2NO2COO3TEMPPRESDEWPRAINwdWSPMstation
403766336392016123114399.0412.031.0198.04900.06.03.81021.9-8.90.0SSE1.0Wanshouxigong
403767336402016123115449.0524.030.0217.05600.08.03.91021.5-6.10.0S1.4Wanshouxigong
403768336412016123116440.0440.026.0200.04700.06.02.81021.5-6.60.0SSE0.7Wanshouxigong
403769336422016123117378.0378.020.0171.03800.04.01.21021.4-5.50.0SSE1.1Wanshouxigong
403770336432016123118392.0458.014.0160.03900.03.0-1.31021.9-6.50.0S0.6Wanshouxigong
403771336442016123119449.0487.010.0153.04500.04.0-1.91022.0-6.10.0ESE0.9Wanshouxigong
403772336452016123120460.0492.012.0146.04100.04.0-2.51022.4-5.50.0ENE0.7Wanshouxigong
403773336462016123121463.0498.012.0141.04400.05.0-3.01022.1-5.30.0E0.9Wanshouxigong
403774336472016123122493.0537.012.0124.05000.08.0-3.01022.7-5.00.0SW0.1Wanshouxigong
403775336482016123123464.0490.08.0111.05400.07.0-4.01022.6-5.70.0ENE0.9Wanshouxigong